Accurate Prediction of Translation Initiation Sites by Universum SVM
نویسندگان
چکیده
In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions that start code for proteins. These points are called translation initiation sites (TIS). The task of recognizing TIS can be modeled as a classification problem. In this paper, we use a new pattern classification algorithm which has recently been proposed by Vapnik to deal with this problem. Numerical experiments proved the considerable improvement of this method compared with the leading existing approaches.
منابع مشابه
Practical Analysis of the Universum SVM Learning
The idea of ‘inference through contradictions’ was introduced by Vapnik[1] in order to incorporate a priori knowledge into the learning process. This knowledge is introduced via additional unlabeled data samples (called virtual examples or the Universum) that are used along with labeled training samples, to perform an inductive inference. For example, if the goal of learning is to discriminate ...
متن کاملUniversum Learning for Multiclass SVM
We introduce Universum learning [1], [2] for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose a span bound for MU-SVM that can be used for model selection thereby avoiding resampling. Empirical results demonstrate the effectiveness of MU-SVM and the proposed bound.
متن کاملEnsemble Universum SVM Learning for Multimodal Classification of Alzheimer's Disease
Recently, machine learning methods (e.g., support vector machine (SVM)) have received increasing attentions in neuroimaging-based Alzheimer’s disease (AD) classification studies. For classifying AD patients from normal controls (NC), standard SVM trains a classification model from only AD and NC subjects. However, in practice besides AD and NC subjects, there may also exist other subjects such ...
متن کاملAccurate Splice Site Detection for Caenorhabditis elegans
We propose a new system for predicting the splice form of Caenorhabditis elegans genes. As a first step we generate a clean set of genes from available exressed sequence tags (EST) and complete complementary (cDNA) sequences. From all such genes we then generate potential acceptor and donor sites as they would be required by any gene finder. This leads to a clean set of true and decoy splice si...
متن کاملEmpirical Study of the Universum SVM Learning for High-Dimensional Data
Many applications of machine learning involve sparse highdimensional data, where the number of input features is (much) larger than the number of data samples, d À n. Predictive modeling of such data is very ill-posed and prone to overfitting. Several recent studies for modeling high-dimensional data employ new learning methodology called Learning through Contradictions or Universum Learning du...
متن کامل